NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval

Su, Hongjin; Yen, Howard; Xia, Mengzhou; Shi, Weijia; Muennighoff, Niklas; Wang, Han-yu; Haisu, Liu; Shi, Quan; Siegel, Zachary S; Tang, Michael; et al (April 2025, International Conference on Learning Representations (ICLR))

Free, publicly-accessible full text available April 24, 2026
Teaching LLMs to Abstain across Languages via Multilingual Feedback

Feng, Shangbin; Shi, Weijia; Wang, Yike; Ding, Wenxuan; Ahia, Orevaoghene; Li, Shuyue_Stella; Balachandran, Vidhisha; Sitaram, Sunayana; Tsvetkov, Yulia (December 2024, EMNLP)

Free, publicly-accessible full text available December 1, 2025
Resolving Knowledge Conflicts in Large Language Models

Wang, Yike; Feng, Shangbin; Wang, Heng; Shi, Weijia; Balachandran, Vidhisha; He, Tianxing; Tsvetkov, Yulia (October 2024, COLM)

Full Text Available
Do Membership Inference Attacks Work on Large Language Models?

Duan, Michael; Suri, Anshuman; Mireshghallah, Niloofar; Min, Sewon; Shi, Weijia; Zettlemoyer, Luke; Tsvetkov, Yulia; Choi, Yejin; Evans, David; Hajishirzi, Hannaneh (October 2024, COLM)

Full Text Available
Trusting Your Evidence: Hallucinate Less with Context-aware Decoding

Shi, Weijia; Han, Xiaochuang; Lewis, Mike; Tsvetkov, Yulia; Zettlemoyer, Luke; Yih, Wen-tau (June 2024, NAACL)

Language models (LMs) often struggle to pay enough attention to the input context, and generate texts that are unfaithful or contain hallucinations. To mitigate this issue, we present context-aware decoding (CAD), which follows a contrastive output distribution that amplifies the difference between the output probabilities when a model is used with and without context. Our experiments show that CAD, without additional training, significantly improves the faithfulness of different LM families, including OPT, GPT, LLaMA, and FLAN-T5 for summarization tasks (e.g., 14.3{\%} gain for LLaMA in factuality metrics). Furthermore, CAD is particularly effective in overriding a model{'}s prior knowledge when it contradicts the provided context, leading to substantial improvements in tasks where resolving the knowledge conflict is essential.
more » « less
Full Text Available
Knowledge Card: Filling LLMs' Knowledge Gaps with Plug-in Specialized Language Models

Feng, Shangbin; Shi, Weijia; Bai, Yuyang; Balachandran, Vidhisha; He, Tianxing; Tsvetkov, Yulia (May 2024, International Conference on Learning Representations)

By design, large language models (LLMs) are static general-purpose models, expensive to retrain or update frequently. As they are increasingly adopted for knowledge-intensive tasks, it becomes evident that these design choices lead to failures to generate factual, relevant, and up-to-date knowledge. To this end, we propose Knowledge Card, a modular framework to plug in new factual and relevant knowledge into general-purpose LLMs. We first introduce knowledge cards---specialized language models trained on corpora from specific domains and sources. Knowledge cards serve as parametric repositories that are selected at inference time to generate background knowledge for the base LLM. We then propose three content selectors to dynamically select and retain information in documents generated by knowledge cards, specifically controlling for relevance, brevity, and factuality of outputs. Finally, we propose two complementary integration approaches to augment the base LLM with the (relevant, factual) knowledge curated from the specialized LMs. Through extensive experiments, we demonstrate that Knowledge Card achieves state-of-the-art performance on six benchmark datasets. Ultimately, Knowledge Card framework enables dynamic synthesis and updates of knowledge from diverse domains. Its modularity will ensure that relevant knowledge can be continuously updated through the collective efforts of the research community.
more » « less
Full Text Available
SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore

Min, Sewon; Gururangan, Suchin; Wallace, Eric; Shi, Weijia; Hajishirzi, Hannaneh; Smith, Noah; Zettlemoyer, Luke (May 2024, ICLR)

Full Text Available
Toward Human Readable Prompt Tuning: Kubrick’s The Shining is a good movie, and a good prompt too?

https://doi.org/10.18653/v1/2023.findings-emnlp.733

Shi, Weijia; Han, Xiaochuang; Gonen, Hila; Holtzman, Ari; Tsvetkov, Yulia; Zettlemoyer, Luke (January 2023, Association for Computational Linguistics)

Large language models can perform downstream tasks in a zero-shot fashion, given natural language prompts that specify the desired behavior. Such prompts are typically hand engineered, but can also be learned with gradient-based methods from labeled data. However, it is underexplored what factors make the prompts effective, especially when the prompts are in natural language. In this paper, we investigate common attributes shared by effective prompts in classification problems. We first propose a human readable prompt tuning method (FluentPrompt) based on Langevin dynamics that incorporates a fluency constraint to find a distribution of effective and fluent prompts. Our analysis reveals that effective prompts are topically related to the task domain and calibrate the prior probability of output labels. Based on these findings, we also propose a method for generating prompts using only unlabeled data, outperforming strong baselines by an average of 7.0{\%} accuracy across three tasks.
more » « less
Full Text Available
Nonparametric Masked Language Modeling

https://doi.org/10.18653/v1/2023.findings-acl.132

Min, Sewon; Shi, Weijia; Lewis, Mike; Chen, Xilun; Yih, Wen-tau; Hajishirzi, Hannaneh; Zettlemoyer, Luke (January 2023, ACl Findings)

Full Text Available
Retrofitting Contextualized Word Embeddings with Paraphrases

https://doi.org/10.18653/v1/D19-1113

Shi, Weijia; Chen, Muhao; Zhou, Pei; Chang, Kai-Wei (January 2019, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP))

Contextualized word embeddings, such as ELMo, provide meaningful representations for words and their contexts. They have been shown to have a great impact on downstream applications. However, we observe that the contextualized embeddings of a word might change drastically when its contexts are paraphrased. As these embeddings are over-sensitive to the context, the downstream model may make different predictions when the input sentence is paraphrased. To address this issue, we propose a post-processing approach to retrofit the embedding with paraphrases. Our method learns an orthogonal transformation on the input space of the contextualized word embedding model, which seeks to minimize the variance of word representations on paraphrased contexts. Experiments show that the proposed method significantly improves ELMo on various sentence classification and inference tasks.
more » « less
Full Text Available

Search for: All records